Conditions on abruptness in a gradient-ascent Maximum Entropy learner

نویسنده

  • Elliott Moreton
چکیده

When does a gradual learning rule translate into gradual learning performance? This paper studies a gradient-ascent Maximum Entropy phonotactic learner, as applied to twoalternative forced-choice performance expressed as log-odds. The main result is that slow initial performance cannot accelerate later if the initial weights are near zero, but can if they are not. Stated another way, abruptness in this learner is an effect of transfer, either from Universal Grammar in the form of an initial weighting, or from previous learning in the form of an acquired weighting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Updating ACO Pheromones Using Stochastic Gradient Ascent and Cross-Entropy Methods

In this paper we introduce two systematic approaches, based on the stochastic gradient ascent algorithm and the cross-entropy method, for deriving the pheromone update rules in the Ant colony optimization metaheuristic. We discuss the relationships between the two methods as well as connections to the update rules previously proposed in the literature.

متن کامل

Probability Density Estimation Using Entropy Maximization

We propose a method for estimating probability density functions and conditional density functions by training on data produced by such distributions. The algorithm employs new stochastic variables that amount to coding of the input, using a principle of entropy maximization. It is shown to be closely related to the maximum likelihood approach. The encoding step of the algorithm provides an est...

متن کامل

Notes on CG and LM-BFGS Optimization of Logistic Regression

It has been recognized that the typical iterative scaling methods [?, ?] used to train logistic regression classification models (maximum entropy models) are quite slow. Goodman has suggested the use of a component-wise optimization of GIS [?], which he has measured to be faster on many tasks. However, in general, the iterative scaling methods pale in comparison to conjugate gradient ascent (fo...

متن کامل

Maximum within-cluster association

This paper addresses a new method and aspect of information-theoretic clustering where we exploits the minimum entropy principle and the quadratic distance measure between probability densities. We present a new minimum entropy objective function which leads to the maximization of within-cluster association. A simple implementation using the gradient ascent method is given. In addition, we show...

متن کامل

Naive Parameter Learning for Optimality Theory - The Hidden Structure Problem

There exist a number of provably correct learning algorithms for Optimality Theory and closely related theories. These include Constraint Demotion (CD; Tesar 1995, et seq.), a family of algorithms for classic OT. For Harmonic Grammar (Legendre, Miyata and Smolensky 1990; Smolensky and Legendre 2006) and related theories (e.g. maximum entropy), there is Stochastic Gradient Ascent (SGA; Soderstro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017